Topic 6: Limitations of Linear Regression
London School of Economics and Political Science
December 1, 2025
Understanding when OLS breaks down shapes how we interpret every regression
\[\widehat{\text{oil price}}_{t+1} = \hat{\beta}_0 + \hat{\beta}_1 \cdot \text{inventory}_{t}\]
She omits variables like geopolitical tensions, OPEC decisions, and dollar strength.
Should she worry about omitted variable bias?
The answer depends entirely on her objective
\[y = x'\beta + e\]
\[\mathbb{E}[x \cdot e] = 0\]
\[y = x'\beta + e\]
\[\mathbb{E}[e | x] = 0\]
\[\mathbb{E}[x \cdot e] = 0\]
\[\mathbb{E}[e | x] = 0\]
Exogeneity \(\implies\) Orthogonality, but not the reverse
Omitted variable bias is a causal concept—it has no meaning in pure prediction
| Goal | Requires Causation? | Model Needed |
|---|---|---|
| Predict oil price tomorrow | No | Projection |
| Understand what drives prices | Yes | Regression |
For forecasting, causal identification is irrelevant
An NGO wants to maximise reach for a health campaign. They model:
\[\widehat{\text{reach}}_i = \hat{\beta}_0 + \hat{\beta}_1 \cdot \text{social media budget}_i + \hat{\beta}_2 \cdot \text{demographics}_i\]
They omit variables like local trust in institutions, existing health infrastructure, and cultural factors.
Allocate budget to locations where predicted reach is highest.
\[\mathbb{E}[x \cdot e] = 0 \; \checkmark\]
\[\mathbb{E}[e | x] = 0 \; \text{(required)}\]
The same organisation may need both models for different decisions
“Where should we allocate next year’s budget for maximum reach?”
“Does increasing social media budget cause higher reach?”
Always ask: “Am I trying to predict or to understand causation?”
“If we increase training, will productivity rise?”
\[\text{productivity}_i = \beta_0 + \beta_1 \cdot \text{training}_i + e_i\]
\[\hat{\beta}_1 = 0.15 \quad (p < 0.01)\]
“Each hour of training associated with 0.15 unit productivity increase”
\[\mathbb{E}[\hat{\beta}_1] = \underbrace{\beta_1}_{\text{true training effect}} + \underbrace{\gamma_2 \cdot \delta_1}_{\text{firm size effect}}\]
where:
\[\text{Bias} = (+) \times (+) = (+)\]
Our estimate \(\hat{\beta}_1\) overstates the true training effect
What our estimate tells us
Firms with more training have higher productivity.
What it does NOT tell us
Giving more training will increase productivity.
The reality
\[\text{cov}(\text{training}, \text{productivity}) > 0\]
Without controlling for firm size, we cannot distinguish these stories
\[\text{wage}_i = \beta_0 + \beta_1 \cdot \text{beauty}_i + e_i\]
\[e_i = \gamma_1 \cdot \text{wage}_i + \text{other factors}\]
\[\mathbb{E}[e_i | \text{beauty}_i] = \mathbb{E}[\gamma_1 \cdot \text{wage}_i | \text{beauty}_i] \neq 0\]
Beauty correlates with the error because wages affect both
Compare:
\[\Delta\text{wage}_{i,t+1} = \beta_0 + \beta_1 \cdot \text{beauty}_{it} + e_{it}\]
\[\beta_1 = \frac{\text{cov}(\text{beauty}_t, \Delta\text{wage}_{t+1})}{\text{var}(\text{beauty}_t)}\]
Using time as a natural ordering helps establish causality
\[\text{earnings}_i = \beta_0 + \beta_1 \cdot \mathbb{1}[i \text{ is healthy}] + e_i\]
\[\Delta\text{earnings}_{i,t+1} = \beta_0 + \beta_1 \cdot \mathbb{1}[i \text{ is healthy at } t] + e_{it}\]
The arrow of time provides identification when simultaneity threatens
\[\text{cov}(y_i, y_j) = 0 \text{ for } i \neq j\]
\[\mathbb{E}[e_i | x_i] = 0\]
\[\beta_1\]
\[\hat{\beta}_1 = \frac{\widehat{\text{cov}}(x,y)}{\widehat{\text{var}}(x)}\]
\[\hat{\beta}_1 = 0.073\]
We say “biased estimator”—never “biased parameter” or “biased estimate”
| Assumption | Statement | Ident. | Estim. | Infer. |
|---|---|---|---|---|
| 1 | Linearity: \(y = \beta_0 + \beta_1 x + e\) | ✓ | ✓ | |
| 2 | Random sampling | ✓ | ✓ | |
| 3 | Variation in \(x\): \(\text{var}(x) > 0\) | ✓ | ✓ | |
| 4 | Zero mean: \(\mathbb{E}[e] = 0\) | ✓ | ✓ | |
| 5 | Exogeneity: \(\mathbb{E}[e \mid x] = 0\) | ✓ | ✓ | |
| 6 | Homoskedasticity: \(\text{var}(e \mid x) = \sigma^2\) | ✓ | ||
| 7 | Normality: \(e \sim N(0, \sigma^2)\) | ✓ |
AS1-AS5: Are we estimating something meaningful? AS6-AS7: Is our uncertainty correct?
\[\text{earnings}_i = \beta_0 + \beta_1 \cdot \text{desk number}_i + e_i\]
\[\begin{align*} H_{0}: \beta_1 = 0 \\ H_{1}: \beta_1 < 0 \end{align*}\]
Survey conducted at alumni meeting
\(\mathbb{P}[i\text{ attended meeting} | \text{earnings}_i] \text{ is increasing in earnings}\)
Why this biases our estimate
If \(\beta_1 < 0\) (front seats → higher earnings):
The consequence
\[\hat{\beta}_1 \approx \beta_1 < 0\]
\[\hat{\beta}_1 > \beta_1\]
Bias is positive, toward zero
Random treatment assignment doesn’t help when sample selection depends on the outcome
What we’ll cover
Why it matters
Understanding which assumption fails guides the solution